Author: Long Zhao
Reviewer: Ying Ge, Junyi Shen
从形式上复现原图。 Reproduce the original picture in form.
有时没有现成的包能画出一模一样的图,我们可以通过计算图中每个元素的位置,然后画出想要的效果。 Sometimes no ready-made package can draw exactly the same picture, we can calculate the position of each element in the graph and then draw the desired effect.
出自 From https://www.nature.com/articles/s41467-019-09222-w
Fig. 2 Characteristics of glycosites identified with AI-ETD. f A glycoprotein-glycan network maps which glycans (outer circle, 117 total) modify which proteins (inner bar, 771 total). Glycoproteins are sorted by number of glycosites (scale to the right). Glycans are organized by classification, and edges are colored by the glycan node from which they originate, except for mannose-6-phosphate which has yellow edges. See Supplementary Fig. 11 and Supplementary Table 1 for glycan identifiers.
A glycoprotein-glycan network diagram in Fig. 2f maps which glycans (outer nodes) were observed on identified glycoproteins (inner column, organized by number of glycosites). Several discernable patterns appear, perhaps most notably the prevalence of high mannose glycosylation. The network diagram also indicates that the majority of fucosylated, paucimannose, and sialylated glycans occur on proteins with multiple glycosylation sites, and it indicates which glycans contribute more to heterogeneity. Supplementary Figure 11 provides a larger version of this network diagram with glycan identities in Supplementary Table 1.
图的解析 Analysis of the figure
这个图展示两层信息:糖蛋白上glycan的种类,糖蛋白上glycosites的数量(数量可替换成其他分类信息,例如上调/下调,或所在的通路等等)。 This graph shows two layers of information: the type of glycan on the glycoprotein and the number of glycosites on the glycoprotein (the number can be replaced with other taxonomic information, such as up/downregulation, pathway, etc.).
原文方法描述是用igraph画的: The original method description is drawn with igraph: The protein-glycan network, glycan co-occurrence networks, and glycosylation profiles for subcellular groups were created in R 3.2.2 using the igraph library70, and the arcplot were created with arcdiagram library。
这里用ggplot2画图,先计算位置再画图。 Here we use ggplot2 to draw a picture, calculate the position first and then draw the picture.
展示分类、分组跟元素之间的关系。例如:glycan-数量-glycoprotein,GO-TF-targe gene、表观修饰-上下调-gene、enhancer-motif-promoter等。 Show the relationship between classification, grouping, and elements. For example: glycan-quantity-glycoprotein, GO-TF-targe gene, epimodifier-up-and-down-gene, enhancer-motif-promoter, etc.
以TF-GO-target gene互作为例,转录组找出几个GO 途径的基因可以放在外围的点,颜色代表不同GO term,点的大小代表基因表达水平或fold change。里面的矩形是我们共公共数据库找到的能够结合这些外围基因的转录因子。结果哪些转录因子激活基因,哪些抑制,就一目了然了。如果有HiC或ChIA-pet数据,hub 位点和靶基因的互作也是同理。 Taking TF-GO-target gene interaction as an example, the transcriptome finds several GO pathway genes that can be placed at peripheral points, the color represents different GO terms, and the size of the dots represents the gene expression level or fold change. The rectangle inside is the transcription factor that can bind these peripheral genes found in our public database. As a result, it is clear which transcription factors activate genes and which inhibit. If HiC or ChIA-pet data are available, the interaction between the hub site and the target gene is the same.
使用国内镜像安装包 Use the domestic mirror installation package
options("repos"= c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))
options(BioC_mirror="http://mirrors.ustc.edu.cn/bioc/")
加载包 Load the package
library(ggplot2) #用于画图 Used for drawing
library(dplyr) #用于数据处理 For data processing
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Sys.setenv(LANGUAGE = "en") #显示英文报错信息 Displays the error message in English
options(stringsAsFactors = FALSE) #禁止chr转成factor Prohibiting CHR from converting to factor
easy_input_node.txt和easy_input_group.txt。从例文41467_2019_9222_MOESM5_ESM.xlsx文件整理而来。 easy_input_node.txt and easy_input_group.txt. Organized from examples 41467_2019_9222_MOESM5_ESM.xlsx documents.
;分隔多个glycon。 easy_input_group.txt, the correspondence of protein-glycan-quantity, is used to draw the central rectangle and the connecting line. One protein per row, use; Separation of multiple glycons.Glycan <- read.table("easy_input_node.txt", sep = "\t", head=T)
head(Glycan)
## Node Glycan Gly.clu
## 1 1 HexNAc(1) Paucimannose
## 2 2 HexNAc(2) Paucimannose
## 3 3 HexNAc(2)Fuc(1) Paucimannose
## 4 4 HexNAc(2)Hex(1) Paucimannose
## 5 5 HexNAc(2)Hex(1)Fuc(1) Paucimannose
## 6 6 HexNAc(2)Hex(2) Paucimannose
dim(Glycan)
## [1] 117 3
protein <- read.table("easy_input_group.txt",head=T)
head(protein)
## Protein Count
## 1 Q9Z2W9 7
## 2 Q9Z2W8 5
## 3 Q9Z2L6 6
## 4 Q9Z2A9 1
## 5 Q9Z247 4
## 6 Q9Z218 2
## Glycan
## 1 HexNAc(6)Hex(3);HexNAc(2)Hex(8);HexNAc(2)Hex(7);HexNAc(2)Hex(6);HexNAc(2)Hex(5);HexNAc(6)Hex(4);HexNAc(4)Hex(4)NeuAc(1)
## 2 HexNAc(2)Hex(8);HexNAc(6)Hex(3);HexNAc(2)Hex(6);HexNAc(2)Hex(9);HexNAc(2)Hex(5)
## 3 HexNAc(2)Hex(6);HexNAc(2)Hex(5);HexNAc(4)Hex(3)Fuc(1);HexNAc(2)Hex(7);HexNAc(6)Hex(3);HexNAc(2)Hex(8)
## 4 HexNAc(2)Hex(5)
## 5 HexNAc(2)Hex(8);HexNAc(2)Hex(7);HexNAc(2)Hex(6);HexNAc(6)Hex(3)
## 6 HexNAc(3)Hex(6);HexNAc(2)Hex(5)
dim(protein)
## [1] 771 3
一共有117个点,右侧(第一、四象限)59个,左侧(二、三象限)58个。 There are 117 points in total, 59 on the right (1st and 4th quadrants) and 58 on the left (2nd and 3rd quadrants).
先计算每个点与x轴的角度,再通过与x轴的角度和三角函数计算出每个点的横纵坐标。 First, the angle of each point to the x-axis is calculated, and then the horizontal and vertical coordinates of each point are calculated by the angle to the x-axis and trigonometric functions.
计算角度时,因为不是整个圆,上下各缺了一块,缺的角度为30度。所以第一个蛋白角度是75度,第59个蛋白是-75度,第60个蛋白是-107度,最后一个是-255度。 When calculating the angle, because it is not the whole circle, there is a piece missing from the top and bottom, and the missing angle is 30 degrees. So the first protein angle is 75 degrees, the 59th protein is -75 degrees, the 60th protein is -107 degrees, and the last one is -255 degrees.
计算坐标时,考虑到后面还要画中央矩形,因此以蛋白数量/2作为半径。 When calculating the coordinates, consider that the central rectangle is also drawn at the back, so the number of proteins/2 is used as the radius.
Glycan$angle <- ifelse(Glycan$Node < 60,
75-(150/58*(Glycan$Node-1)),
75-(150/58*(Glycan$Node-1))-30)
Glycan$G.x <- cos(Glycan$angle*pi/180.0)*(nrow(protein)/2)
Glycan$G.y <- sin(Glycan$angle*pi/180.0)*(nrow(protein)/2)
head(Glycan)
## Node Glycan Gly.clu angle G.x G.y
## 1 1 HexNAc(1) Paucimannose 75.00000 99.77474 372.3644
## 2 2 HexNAc(2) Paucimannose 72.41379 116.47513 367.4831
## 3 3 HexNAc(2)Fuc(1) Paucimannose 69.82759 132.93825 361.8531
## 4 4 HexNAc(2)Hex(1) Paucimannose 67.24138 149.13056 355.4860
## 5 5 HexNAc(2)Hex(1)Fuc(1) Paucimannose 64.65517 165.01909 348.3948
## 6 6 HexNAc(2)Hex(2) Paucimannose 62.06897 180.57145 340.5939
外围的弧线内侧是圆半径的1.1倍,外侧是1.15倍 The inner side of the outer arc is 1.1 times the radius of the circle, and the outside is 1.15 times
# 取Gly.clu列 Take Gly.clu
curve <- Glycan[,3, drop=F]
curve$x.start <- Glycan$G.x*1.1
curve$x.end <- Glycan$G.x*1.15
curve$y.start <- Glycan$G.y*1.1
curve$y.end <- Glycan$G.y*1.15
# 给弧线之间留个空隙,根据自己的数据调整数量吧 Leave a gap between the arcs and adjust the number according to your data
curve <- curve[-c(9:10, #9是第一类Paucimannose的数量,以此类推 9 is the number of Paucimannose of the first class, and so on
45:46,73:74,107:108),]
head(curve)
## Gly.clu x.start x.end y.start y.end
## 1 Paucimannose 109.7522 114.7410 409.6008 428.2191
## 2 Paucimannose 128.1226 133.9464 404.2314 422.6055
## 3 Paucimannose 146.2321 152.8790 398.0384 416.1311
## 4 Paucimannose 164.0436 171.5001 391.0346 408.8089
## 5 Paucimannose 181.5210 189.7720 383.2343 400.6540
## 6 Paucimannose 198.6286 207.6572 374.6533 391.6830
根据glycosites数量分成5类,其中4和5一组,>5一组,与右侧图例对应。 According to the number of glycosites, they are divided into 5 categories, of which 4 and 5 are grouped, and > 5 is grouped, corresponding to the legend on the right.
protein.pos <- protein[,1:2]
protein.pos <- arrange(protein.pos, Count)
protein.pos$P.x <- 0
protein.pos$P.y <- seq(floor(nrow(protein)/2),-floor(nrow(protein)/2))
protein.pos$cluster <- ifelse(protein.pos$Count < 4, protein.pos$Count, ifelse(protein.pos$Count < 6, "4", "5"))
head(protein.pos)
## Protein Count P.x P.y cluster
## 1 Q9Z2A9 1 0 385 1
## 2 Q9WVT6 1 0 384 1
## 3 Q9WVB4 1 0 383 1
## 4 Q9WV91 1 0 382 1
## 5 Q9WUH7 1 0 381 1
## 6 Q9R1V6-17 1 0 380 1
connect <- protein[,c(1,3)]
Gly.name <- strsplit(as.character(connect[,2]),split=";")
connect.count <- sapply(Gly.name,length)
connection <- data.frame(Protein=rep(connect$Protein,connect.count),Glycan=unlist(Gly.name))
seg.clu主要是为了画最后一个Glycan那一点点黄色 seg.clu is mainly used to draw a little bit of yellow at the end of the Glycan
all <- merge(connection,Glycan,by="Glycan")
all1 <- merge(all,protein.pos,by="Protein")
all1$seg.clu <- ifelse(all1$Glycan=="HexNAc(2)Hex(6)Phospho(1)","zz",all1$Gly.clu)
head(all1)
## Protein Glycan Node Gly.clu angle
## 1 A0A087WPX3 HexNAc(1) 1 Paucimannose 75.00000
## 2 A0A087WPX3 HexNAc(6)Hex(4) 60 Complex/Hybrid -107.58621
## 3 A0A087WPX3 HexNAc(6)Hex(3) 59 Complex/Hybrid -75.00000
## 4 A0A087WPX3 HexNAc(3)Hex(3)Fuc(1) 77 Fucosylated -151.55172
## 5 A0A087WPX3 HexNAc(2) 2 Paucimannose 72.41379
## 6 A0A087WPX3 HexNAc(2)Hex(9) 111 High Mannose -239.48276
## G.x G.y Count P.x P.y cluster seg.clu
## 1 99.77474 372.3644 27 0 -378 5 Paucimannose
## 2 -116.47513 -367.4831 27 0 -378 5 Complex/Hybrid
## 3 99.77474 -372.3644 27 0 -378 5 Complex/Hybrid
## 4 -338.94992 -183.6388 27 0 -378 5 Fucosylated
## 5 116.47513 367.4831 27 0 -378 5 Paucimannose
## 6 -195.75598 332.0991 27 0 -378 5 High Mannose
# 自定义颜色 Custom colors
cols <- c("royalblue4","gray80","seagreen3","powderblue","steelblue","goldenrod1")
fills <- c("#FFFFD4","#FEE391","#FEC44F","#FE9929","#D95F0E")
# 查看颜色 Look at the colors
library(scales)
show_col(cols)
show_col(fills)
# 画点 Draw dots
p <- ggplot(all1) +
geom_point(aes(G.x,G.y,color=Gly.clu))
p
# 画连线 Draw a line
p1 <- p +
geom_segment(aes(x=G.x,y=G.y,xend=P.x,yend=P.y,color=seg.clu),alpha=0.1)
p1
# 画矩形 Draw a rectangle
p2 <- p1 +
geom_rect(aes(xmin=-15,xmax=15,ymin=P.y,ymax=P.y+1,fill=cluster)) +
geom_rect(aes(xmin=500,xmax=530,ymin=P.y,ymax=P.y+1,fill=cluster))
p2
# 画弧线 Draw an arc
p3 <- p2 +
geom_segment(data=curve,aes(x=x.start,xend=x.end,y=y.start,yend=y.end,color=Gly.clu),size=5)
p3
# 清空背景和坐标轴 Clear the background and axis
p4 <- p3 +
theme_minimal() +
theme(
legend.position = "none",
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank()
)
p4
# 用自定义颜色填充和画线 Fill and draw lines with custom colors
p5 <- p4 +
scale_color_manual(values=cols) +
scale_fill_manual(values=fills)
p5
# 输出到文件 Output to a file
pdf(file="target.pdf", height=5, width=5.5)
p5
dev.off()
## quartz_off_screen
## 2
输出的PDF文件是矢量图,可以用PS或AI等矢量图编辑器打开,编辑图形和文字。 The output PDF file is a vector image, which can be opened with a vector image editor such as PS or AI to edit graphics and text.
其实本来文字和数字都是可以在R中加的,但是还需要计算位置,莫不如在PS里修改。这里有一个PS的中间文件可以参考target.psd和target.tif。 In fact, words and numbers can be added in R, but the position needs to be calculated, so it is better to modify it in PS. Here is an intermediate file for PS that can refer to target.psd and target.tif.
在用到自己数据的时候,最好先把原数据跑一遍,理解每个数字的含义,和整体的思路,再去举一反三。这里涉及到一些数学的三角函数计算,和角度的计算,还是要理解每个数字的含义的。 When using your own data, it is best to run the original data first, understand the meaning of each number, and the overall idea, and then draw inferences. Here it involves some mathematical trigonometric calculations and angle calculations, or to understand the meaning of each number.
sessionInfo()
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS 10.15
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scales_1.0.0 dplyr_0.8.3 ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.2 knitr_1.24 magrittr_1.5 tidyselect_0.2.5
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.0 rlang_0.4.0
## [9] stringr_1.4.0 tools_3.6.0 grid_3.6.0 gtable_0.3.0
## [13] xfun_0.9 withr_2.1.2 htmltools_0.3.6 yaml_2.2.0
## [17] lazyeval_0.2.2 digest_0.6.20 assertthat_0.2.1 tibble_2.1.3
## [21] crayon_1.3.4 purrr_0.3.2 glue_1.3.1 evaluate_0.14
## [25] rmarkdown_1.15 labeling_0.3 stringi_1.4.3 compiler_3.6.0
## [29] pillar_1.4.2 pkgconfig_2.0.2